Plastic action-selection networks for neuromorphic hardware

ABSTRACT

A neural model for reinforcement-learning and for action-selection includes a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, and a population of reward neurons in each of the channels. Each channel of a population of reward neurons receives input from an environmental input, and is coupled only to output neurons in a channel that the reward neuron is part of. If the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, otherwise the corresponding channel of a population of output neurons are punished and have their responses attenuated.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is related to and claims priority to U.S.Provisional Patent Application Ser. No. 61/732,590 filed on Dec. 3,2012, which is hereby incorporated by reference in its entirety.

STATEMENT REGARDING FEDERAL FUNDING

This invention was made under U.S. Government contract DARPA SyNAPSEHR0011-09-C-0001. The U.S. Government has certain rights in thisinvention.

TECHNICAL FIELD

This disclosure relates to neural networks, and in particular to neuralnetworks capable of action-selection and reinforcement-learning.

BACKGROUND

In the prior art, neural networks capable of action-selection have beenwell characterized, as have those that demonstratereinforcement-learning. However, in the prior art, action-selection andreinforcement-learning algorithms present complex solutions to thedistal reward problem, which are not easily amenable to hardwareimplementations.

Barr, D., P. Dudek, J. Chambers, and K. Gurney describe in“Implementation of multi-layer leaky integrator networks on a cellularprocessor array” Neural Networks, 2007. IJCNN August 2007. InternationalJoint Conference, pp. 1560-1565, a model of the basal ganglia on aneural processor array. The software neural model was capable ofperforming action selection. However, Barr et al. did not describe anyinherent mechanisms for reinforcement-learning and the micro-channels ofthe basal ganglia were predefined.

Merolla, P., J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. Modhadescribe in “A digital neurosynaptic core using embedded crossbar memorywith 45 pj per spike in 45 nm” Custom Integrated Circuits Conference(CICC), September 2011 IEEE, pp. 1-4, a neuromorphic processor capableof playing a game of pong against a human opponent. However, the networkwas constructed off-line and once programmed on the hardware, remainedstatic.

What is needed is a neural network that implements action-selection andreinforcement-learning and that can be more readily implemented withhardware. The embodiments of the present disclosure answer these andother needs.

SUMMARY

In a first embodiment disclosed herein, a neural model forreinforcement-learning and for action-selection comprises a plurality ofchannels, a population of input neurons in each of the channels, apopulation of output neurons in each of the channels, each population ofinput neurons in each of the channels coupled to each population ofoutput neurons in each of the channels and a population of rewardneurons in each of the channels, wherein each population of rewardneurons receives input from an environmental input, and wherein eachchannel of reward neurons is coupled only to output neurons in a channelthat the reward neuron is part of, wherein if the environmental inputfor a channel is positive, the corresponding channel of a population ofoutput neurons are rewarded and have their responses reinforced, andwherein if the environmental input for a channel is negative, thecorresponding channel of a population of output neurons are punished andhave their responses attenuated.

In another embodiment disclosed herein, a neural model forreinforcement-learning and for action-selection comprises a plurality ofchannels, a population of input neurons in each of the channels, apopulation of output neurons in each of the channels, each population ofinput neurons in each of the channels coupled to each population ofoutput neurons in each of the channels, a population of reward neuronsin each of the channels, wherein each population of reward neuronsreceives input from an environmental input, and wherein each channel ofreward neurons is coupled only to output neurons in a channel that thereward neuron is part of, and a population of inhibition neurons in eachof the channels, wherein each population of inhibition neurons receivean input from a population of output neurons in a same channel that thepopulation of inhibition neurons is part of, and wherein a population ofinhibition neurons in a channel has an output to output neurons in everyother channel except the channel of which the inhibition neurons arepart of, wherein if the environmental input to a population of rewardneurons for a channel is positive, the corresponding channel of apopulation of output neurons are rewarded and have their responsesreinforced, and wherein if the environmental input to a population ofreward neurons for a channel is negative, the corresponding channel of apopulation of output neurons are punished and have their responsesattenuated.

In yet another embodiment disclosed herein, a basal ganglia neuralnetwork model comprises a plurality of channels, a population of cortexneurons in each of the channels, a population of striatum neurons ineach of the channels, each population of striatum neurons in each of thechannels coupled to each population of cortex neurons in each of thechannels, a population of reward neurons in each of the channels,wherein each population of reward neurons receives input from anenvironmental input, and wherein each channel of reward neurons iscoupled only to striatum neurons in a channel that the reward neuron ispart of, and a population of Substantia Nigra pars reticulata (SNr)neurons in each of the channels, wherein each population of SNr neuronsis coupled only to a population of striatum neurons in a channel thatthe SNr neurons are part of, wherein if the environmental input to apopulation of reward neurons for a channel is positive, thecorresponding channel of a population of striatum neurons are rewardedand have their responses reinforced, wherein if the environmental inputto a population of reward neurons for a channel is negative, thecorresponding channel of a population of striatum neurons are punishedand have their responses attenuated, and wherein each population of SNrneurons is tonically active and is suppressed by inhibitory afferents ofstriatum neurons in a channel that the SNr neurons are part of.

These and other features and advantages will become further apparentfrom the detailed description and accompanying figures that follow. Inthe figures and description, numerals indicate the various features,like numerals referring to like features throughout both the drawingsand the description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a neural network in accordance with the present disclosure;

FIG. 2 shows another neural network with lateral inhibition inaccordance with the present disclosure;

FIG. 3 shows a basal ganglia neural network in accordance with thepresent disclosure;

FIGS. 4A to 4C show an example of a reward-learning scenario inaccordance with the present disclosure;

FIGS. 5A to 5F show an example of synaptic weights for a neural networkin accordance with the present disclosure;

FIG. 6 is a diagram showing a pong style virtual environment inaccordance with the present disclosure;

FIGS. 7A to 7C, 8A to 8C, and 9A to 9C illustrate results for the pongstyle virtual environment of FIG. 6 for different spatial widths andtime spans in accordance with the present disclosure; and

FIG. 10 illustrates the overall accuracy for the model with a spatialwidth of 0.025 in accordance with the present disclosure.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth toclearly describe various specific embodiments disclosed herein. Oneskilled in the art, however, will understand that the presently claimedinvention may be practiced without all of the specific details discussedbelow. In other instances, well known features have not been describedso as not to obscure the invention.

The combination of action-selection and reinforcement-learning inbiological entities is essential for successfully adapting and thrivingin any environment. This is also true for the successful operation ofintelligent agents. Presented here are the design and implementation ofbiologically inspired action selection/reinforcement-networks for thecontrol of an agent by a neuromorphic processor.

The embodied modeling can be described as the coupling of computationalbiology and engineering. Historically strategies for embeddingartificial intelligence have failed to result in agents with trulyemergent properties. Because of this it is still unreasonable to deploya robotic entity and expect it to learn from its environment the waybiological entities can. Similarly, neural models require complex andvaried input signals in order to accurately replicate the activityobserved in vivo. One method for creating these complex stimuli isthrough immersing a model in a real or virtual environment capable ofproviding feedback.

Conceptually, action selection is the arbitration of competing signals.In the mammalian nervous system the complex circuitry of the basalganglia (BG) is active in gating the information flow in the frontalcortex by appropriately selecting between input signals. This selectionmechanism can affect simple action all the way up to complex behaviorsand cognitive processing. Although overly simplified, it can be helpfulto relate the BG to a circuit multiplexer that actively connectinginputs to outputs based on the current system state.

Reinforcement or reward learning (RL) is the reinforcement of actions ordecisions that maximizes the positive outcome of those choices. This issimilar to instrumental conditioning where stimulus response trialsresult in reinforcement of responses that are rewarded and attenuationof those that are not. Reinforcement-learning in a neural network is anideal alternative to supervised learning algorithms. Where supervisedlearning requires an intelligent teaching signal that must have adetailed understanding of the task, reinforcement learning can developindependent of the task without any prior knowledge. Only the quality ofthe output signal in response to the input signal and current contextualstate of the network is needed.

In an embodiment according to the present disclosure, neurons within aneural network may be modeled by a Leaky-Integrate and Fire (LIF) model.The LIF model is defined by equation 1.

$\begin{matrix}{{C_{m}\frac{V}{t}} = {{- {g_{leak}\left( {V - E_{rest}} \right)}} + {I.}}} & (1)\end{matrix}$

where

Cm is the membrane capacitance,

I is the sum of external and synaptic currents,

gleak conductance of the leak channels, and

Erest is the reversal potential for that particular class of synapse.

As the current input into the model neuron is increased the membranevoltage will proportionally increase until a threshold voltage isreached. At this point an action potential is fired and the membranevoltage is reset to the resting value. The neuron model is placed in arefractory period for 2 milliseconds where no changes in membranevoltages are allowed. If the current is removed before reaching thethreshold, the voltage will decay to Erest. The LIF model is one of theleast computationally intensive neural models but is still capable ofreplicating many aspects of neural activity.

The connections between neurons or synapses are modeled byconductance-based synapses. The general form of that influence isdefined as equation 2.

g _(syn) =g _(max) ·g _(eff)·(V−E _(syn).  (2)

where

gmax is the maximum conductance for that particular class of synapse,

geff is the current synaptic efficacy between [0, geffmax], and

Esyn is the reversal potential for that particular class of synapse.

To simulate the buffering and re-uptake of neurotransmitters, theinfluence that a presynaptic action potential has on a neuron can bedecayed based on a specified time constant. This process is abstractedusing equation 3.

$\begin{matrix}{\tau_{syn} = {\frac{g_{i}^{syn}}{t} = {{- g_{i}^{syn}} + {\sum\limits_{\;}^{\;}\; {W_{ji}{{\delta \left( {t - t_{j}} \right)}.}}}}}} & (3)\end{matrix}$

Learning at the synaptic level is achieved through the spike-timingdependent plasticity rules described in Song, S., K. D. Miller, and L.F. Abbott (2000), “Competitive Hebbian Learning through Spike-timingDependent Synaptic Plasticity” Nature Neuroscience (9) 2000, pp 919-926,as shown in equation 4.

g _(eff) →g _(eff) +g _(effmax) F(Δt)  (4)

where

Δ t = t_(pre) − t_(post)${F\left( {\Delta \; t} \right)} = \left\{ {{\begin{matrix}{A_{+}^{(\frac{\Delta \; t}{\tau_{+}})}} \\{A_{-}^{(\frac{\Delta \; t}{\tau_{-}})}}\end{matrix}{if}\mspace{14mu} \left( {g_{eff} < 0} \right)\mspace{14mu} {then}\mspace{14mu} g_{eff}}->{{0{if}\mspace{14mu} \left( {g > g_{effmax}} \right)\mspace{14mu} {then}{\mspace{11mu} \;}g_{eff}}->g_{effmax}}} \right.$

The global parameter values that may be used in one embodiment arepresented in Table 1. The governing equations are numerically integratedusing Euler integration with a 1 milliseconds (ms) time step.

TABLE 1 Global model parameters. Parameter Value C_(m) 1. (pF) τ_(ge) 5.(ms) τ_(gi) 100. (ms) E_(exc) 0. (mV) E_(inh) −80. (mV) V_(rest) 0. (mV)A₊ 0.025 A⁻ 0.026 τ₊ 20. (ms) τ⁻ 20. (ms)

FIGS. 1 to 3 show three different neural network embodiments. Initially,each of these networks has no knowledge or inherent understanding oftheir environment. The behavior is learned through feedback from theenvironment in the form of reward and punishment signals encoded aseither random or structured spike events. These signals strengthen orweaken the synaptic connections between neurons; reinforcing theappropriate action.

The first model, shown in FIG. 1, is a simple feed-forward network thatconsists entirely of excitatory neurons arranged into N channels. Theneural network of FIG. 1 has N channels. Each of the N channels has apopulation of input neurons 12, a population of output neurons 14, and apopulation of reward neurons 16.

In one embodiment the populations of input neurons 12 are connected withequal probability and equal conductance to all of the populations ofoutput neurons 14, ensuring that there is no inherent bias to aparticular input-output pair. In another embodiment, the populations ofinput neurons 12 are connected randomly to the populations of outputneurons 14. This embodiment is particularly important to large-scaleimplementations of these networks as well as afferent limitationsimposed by a neuromorphic architecture.

Each channel of a population of input neurons 12 is connected to eachchannel of a population of output neurons 14 channel by synapses 18. Oneset of parameters that may be used with the model of FIG. 1 is presentedin Table 2. The synapse connections 18 between input neurons 12 andoutput neurons 14 are randomly created from the entire input neuron 12population to ensure that there is no bias between input and outputchannels.

Reward neurons 16 receive input from environmental inputs 20, which maybe sensed from the environment. Each channel of reward neurons iscoupled to only one corresponding channel of output neurons 20 viasynapses 22. If the environmental inputs for a channel are positive, thecorresponding channel of output neurons 14 are rewarded and have theirresponses reinforced. If the environmental inputs for a channel arenegative, the corresponding channel output neurons 14 are punished andhave their responses attenuated.

The input neurons 12, the output neurons 14 and the reward neurons 16may be modeled by the Leaky-Integrate and Fire (LIF) model defined byequation 1. The synapses 18 and 22 may be modeled by the spike-timingdependent plasticity (STDP) of equation 4.

TABLE 2 Parameters for the excitatory only network. A. Neuron parametersNeurons Neural Region Per Channel Input 3 Output 3 Reward 1 B.Connections Synaptic Conductance Number of Incoming Source → Destination(g_(max)) · (g_(eff)) Connections (total) Input → Output  (10.0) ·(0.25) 15 Reward → Input (10.0) · (1.0) 1

FIG. 2 shows another neural network with lateral inhibition between theoutput populations in accordance with the present disclosure. The neuralnetwork of FIG. 2 creates an on-center off-surround network where themost active population suppresses the other output populations. Not onlyis this a more biologically realistic network but it also offers morecontrol in the selection process. One set of parameters for this modelmay be the parameters shown in Table 3. A key aspect of the neuralnetwork is the diffuse connections of the inhibition neurons 36. Eachchannel of a population of inhibition neurons 36 project to every otherchannel of output neurons 32, excluding the channel of which thepopulation of inhibition neurons 36 are a part of.

The neural network of FIG. 2 has N channels. Each of the N channels hasa population of input neurons 30, a population of output neurons 32, apopulation of reward neurons 34, and a population of inhibition neurons36. Each channel of a population of input neurons 30 is connected toeach channel of a population of output neurons 32 channel by synapses38.

In one embodiment the populations of input neurons 30 are connected withequal probability and equal conductance to all of the populations ofoutput neurons 32, ensuring that there is no inherent bias to aparticular input-output pair. In another embodiment, the synapseconnections 38 between the populations of input neurons 30 and thepopulations of output neurons 32 are connected randomly from the entirepopulation of input neurons 30.

Each channel of a population of reward neurons 34 receives inputs fromenvironmental inputs 40, which may be sensed from the environment. Eachchannel of a population of reward neurons 34 is coupled to only onecorresponding channel of a population of output neurons 32 via synapses42. If the environmental inputs for a channel are positive, thecorresponding channel of output neurons 32 are rewarded and have theirresponses reinforced. If the environmental inputs for a channel arenegative, the corresponding channel output neurons 32 are punished andhave their responses attenuated.

Each channel of a population of output neurons 32 are connected bysynapses 46 to a corresponding channel of a population of inhibitionneurons 36. The inhibition neurons 36 in a channel are coupled viasynapses 44 to output neurons 32 in every other channel; however theinhibition neurons 36 in a channel are not coupled to output neurons 32of the channel of which the inhibition neurons 36 are part of.

As the responses from the output neurons 32 of a channel of which theinhibition neurons 36 are part of increase, the inhibition neurons 36may via the synapses 44 inhibit the responses from output neurons 32 inevery other channel.

The input neurons 30, the output neurons 32, the reward neurons 34, andthe inhibition neurons 36 may be modeled by the Leaky-Integrate and Fire(LIF) model defined by equation 1. The synapses 38, 42, 44 and 46 may bemodeled by the spike-timing dependent plasticity (STDP) of equation 4.

TABLE 3 Parameters for the lateral-inhibition network. A. Neuronparameters Neurons Neural Region Per Channel Input 3 Output 3 Inhibition3 Reward 1 B. Connections Synaptic Conductance Number of Incoming Source→ Destination (g_(max)) · (g_(eff)) Connections (total) Input → Output (10.0) · (0.25) 15 Output → Inhibition (10.0) · (1.0) 15 Inhibition →Output (10.0) · (1.0) 15 Reward → Input (10.0) · (1.0) 1

FIG. 3 shows a basal ganglia (BG) neural network in accordance with thepresent disclosure. The neural network of FIG. 3 emulates thephysiological activity of the BG direct pathway, where the SubstantiaNigra pars reticulata (SNr) neurons 56 are tonically active, firingaround 30 Hz. The substantia nigra is part of the basal ganglia and thepars reticulata is part of the substantia nigra. The basal activity ofthe SNr neurons 56 is suppressed by the inhibitory afferents of thestriatum neurons 52, resulting in a disinhibitory mechanism of action.Learning occurs between the cortex neurons 50 and the striatum neurons52 to develop the appropriate input-output channel combinations. One setof parameters that may be use this model are shown in Table 4.

TABLE 4 Parameters for the basal ganglia direct pathway. A. Neuronparameters Neurons Neural Region Per Channel Cortex (Ctx) 4 Striatum(Str) 3 Substania Nigra 3 pars reticulata (SNr) Excitatory 9 Reward 6 B.Connections Number of Incoming Synaptic Connections Source → DestinationConductance (per channel) Ctx → Str 0.1 4 Str → Str (diffuse) 10.0 3Excitatory → SNr 0.08 3 Str → SNr 10.0 3 Reward → Str 10.0 6

Physiologically, the SNr neurons 54 are tonically active. However, theLIF neuron of equation 1 is not capable of replicating that spontaneousactivity. To compensate, a Poisson random excitatory input 68 isinjected into the SNr neuron populations 56. In addition, low-leveluniform random noise may be injected into the network.

The neural network of FIG. 3 has N channels. Each of the N channels hasa population of cortex neurons 50, a population of striatum neurons 52,a population of reward neurons 54, and a population of SNr neurons 56.Each channel of cortex neurons 50 is connected to each striatum neuronchannel by synapses 58.

In one embodiment the populations of cortex neurons 50 are connectedwith equal probability and equal conductance to all of the populationsof striatum neurons 52, ensuring that there is no inherent bias to aparticular cortex-striatum pair. In another embodiment, the populationsof cortex neurons 50 are connected randomly to the populations ofstriatum neurons 52.

The population of striatum neurons 52 in a channel is connected to thepopulation of striatum neurons 52 in every other channel by synapses 60.

Reward neurons 54 receive input from environmental inputs 62, which maybe sensed from the environment. Each channel of reward neurons 54 iscoupled to only to the corresponding channel of striatum neurons 52 ofwhich the reward neurons 54 are part of via synapses 64. If theenvironmental inputs for a channel are positive, the correspondingchannel of striatum neurons 52 are rewarded and have their responsesreinforced. If the environmental inputs for a channel are negative, thecorresponding channel striatum neurons 52 are punished and have theirresponses attenuated.

Each channel of striatum neurons 52 are connected by synapses 66 only toa corresponding channel of SNr neurons 56. A Poisson random excitatoryinput 68 is injected into each channel of SNr neurons 56.

The cortex neurons 50, the striatum neurons 52, the reward neurons 54,and the SNr neurons 56 may be modeled by the Leaky-Integrate and Fire(LIF) model defined by equation 1. The synapses 58, 60, 64 and 66 may bemodeled by the spike-timing dependent plasticity (STDP) of equation 4.

Learning in these networks is driven by a conditioned stimulusinjection. Stereotyped spiking signals may be sent to an inputpopulation and all of the reward populations. The timing of the signalis delayed for the target channel so the synaptic learning between theinput population and the desired output populations is potentiated,while all other channels are depressed. The timing of these signals aredependent on the values chosen in Equation 4. Punishment signals can beinjected by removing the delay from the target reward population andsuppressing the activity of the other output populations.

This is only one way of exploiting the architecture of these networks tocreate arbitrary input/output combinations. Any Hebbian, actor-critic,reward-modulated or distal-reward learning rule can be applied toachieve the same modulation of the synaptic weights.

Similarly, the LIF neuron is only an example of a neural model that canbe used. Any mathematical model capable of integrating multiple signalsand converting that into discrete time events can be employed in thesenetworks.

Finally, the specific connectivity is not crucial to the performance;increasing the number of connections per cell can improve the stabilityand plasticity.

The model of FIG. 1 has been implemented under the constraints of aninitial memristor based neuromorphic processor. An examplereward-learning scenario is illustrated in FIGS. 4A-4C. FIG. 4A shows anactivity rate map of the example scenario. The activity was calculatedusing a moving Gaussian weighted window. FIG. 4B shows a spike raster ofthe input populations. FIG. 4C shows a spike raster of the outputpopulations.

The stages are marked by the letters in the center of FIG. 4A. FIGS.5A-5F show the synaptic weights at 0 sec., 10 sec., 11 sec., 21 sec., 22sec, and 33 sec., respectively.

In stage A, the network is initialized with all input/output connectionshave a synaptic USE value of 0.25; as illustrated in FIG. 5A by the heatmap of the average weights between input/output populations.

In stage B, a Poisson random input is injected into consecutive channelsfor 10 seconds to establish the basal activity of the network. Theresulting average synaptic weight matrix is shown in FIG. 5B.

In stage C, alternating reward signals are sent to establish singleinput/output pairs. The weight matrix is now dominated by the diagonalshown in FIG. 5C.

In stage D, the repeated Poisson input signals from B., above, areinjected for 10 seconds. After this, the weight matrix shown in FIG. 5Ddemonstrates further potentiation of the established input/output pairsand a continued depression of the other connections.

In stage E, an opposite set of input/output associations are establishedusing alternating reward signals. For stable retraining of the networkthe reward protocol needs to be about twice as long as the originaltraining. The new weight matrix is shown in FIG. 5E.

In stage F, 10 seconds of the repeated Poisson inputs illustrate thenewly established input/output pairs in FIG. 5F.

To illustrate the lateral inhibition network a pong style virtualenvironment was implemented. FIG. 6 is a mock-up of that environment.The position of the puck 70 in the game space is sent to a number ofdiscretized neural channels. Each of these channels in essencerepresents a vertical column of the game board. The inputs are Poissonrandom spiking events with a rate defined by a Gaussian curve, describedbelow. This provides a noisy input signal with overlap between channels.The networks signal, through a winner-takes-all mechanism, the positionof the paddle 72.

Initially, the network has no knowledge or inherent understanding of howto play the game. The behavior is learned through feedback provided asreward and punishment signals encoded as random spike events. Thestimulus into the network is determined by the location of the puck 70to each of the spatial channels. The signal strength for each spatialchannel is computed by sampling a Gaussian function based on thelocation of the channel. The location of the puck 70 on the mapdetermines the peak amplitude and center of a Gaussian function definedas

fx _(c)(X*)=ae ^(−((x) ^(c) ^(-X*)) ² ^(/2e) ² ⁾  (1)

where

a is a peak amplitude of the Gaussian function,

b is a center of the Gaussian function,

c is a spatial width of the Gaussian function, and

Xc is the non-dimensional location of the channel.

The peak amplitude and Gaussian center are defined as

a=Y*·R _(max)  (2)

b=X*  (3)

where

Y* is the non-dimensional location of the puck in the y dimension,

Rmax is the maximum input stimulus in spikes/s, and

X* Non-dimensional location of the puck in the x dimension.

This is visualized in FIG. 7 for a spatial width, c, of 0:05. The rewardor punishment to the network arrives when the puck 70 reaches the bottomof the game board 74. FIG. 7A shows an example stimulus map for twospatial channels. FIG. 7B shows a stimulus overlap between twoconsecutive spatial channels. FIG. 7C shows an example stimulus fordifferent locations of the puck 70.

FIGS. 8 and 9 show the results for a spatial width, c, of 0:025 at FIG.8A 0-25 sec., FIG. 8B 50-75 sec., and FIG. 8C 125-150 sec. FIG. 10 showsthe overall accuracy for the model with a spatial width, c, of 0:025.

The neural networks of FIGS. 1-3 may be implemented with passive andactive electronics components including transistors, resistors, andcapacitors. The neural networks may also be implemented with computersor processors. One type of processor that may be used is a memristorbased neuromorphic processor.

Having now described the invention in accordance with the requirementsof the patent statutes, those skilled in this art will understand how tomake changes and modifications to the present invention to meet theirspecific requirements or conditions. Such changes and modifications maybe made without departing from the scope and spirit of the invention asdisclosed herein.

The foregoing Detailed Description of exemplary and preferredembodiments is presented for purposes of illustration and disclosure inaccordance with the requirements of the law. It is not intended to beexhaustive nor to limit the invention to the precise form(s) described,but only to enable others skilled in the art to understand how theinvention may be suited for a particular use or implementation. Thepossibility of modifications and variations will be apparent topractitioners skilled in the art. No limitation is intended by thedescription of exemplary embodiments which may have included tolerances,feature dimensions, specific operating conditions, engineeringspecifications, or the like, and which may vary between implementationsor with changes to the state of the art, and no limitation should beimplied therefrom. Applicant has made this disclosure with respect tothe current state of the art, but also contemplates advancements andthat adaptations in the future may take into consideration of thoseadvancements, namely in accordance with the then current state of theart. It is intended that the scope of the invention be defined by theClaims as written and equivalents as applicable. Reference to a claimelement in the singular is not intended to mean “one and only one”unless explicitly so stated. Moreover, no element, component, nor methodor process step in this disclosure is intended to be dedicated to thepublic regardless of whether the element, component, or step isexplicitly recited in the Claims. No claim element herein is to beconstrued under the provisions of 35 U.S.C. Sec. 112, sixth paragraph,unless the element is expressly recited using the phrase “means for . .. ” and no method or process step herein is to be construed under thoseprovisions unless the step, or steps, are expressly recited using thephrase “comprising the step(s) of . . . . ”

What is claimed is:
 1. A neural model for reinforcement-learning and for action-selection comprising: a plurality of channels; a population of input neurons in each of the channels; a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels; and a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of; wherein if the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced; and wherein if the environmental input for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
 2. The neural model of claim 1 wherein each population of output neurons in each of the channels are coupled to each population of input neurons in each of the channels by a synapse having spike-timing dependent plasticity behaving according to g _(eff) →g _(eff) +g _(effmax) F(Δt) where Δ t = t_(pre) − t_(post) ${F\left( {\Delta \; t} \right)} = \left\{ {{\begin{matrix} {A_{+}^{(\frac{\Delta \; t}{\tau_{+}})}} \\ {A_{-}^{(\frac{\Delta \; t}{\tau_{-}})}} \end{matrix}{if}\mspace{14mu} \left( {g_{eff} < 0} \right)\mspace{14mu} {then}\mspace{14mu} g_{eff}}->{{0{if}\mspace{14mu} \left( {g > g_{effmax}} \right)\mspace{14mu} {then}{\mspace{11mu} \;}g_{eff}}->{g_{effmax}.}}} \right.$
 3. The neural model of claim 1 wherein each population of input neurons, each population of output neurons, and each population of reward neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to $\begin{matrix} {{C_{m}\frac{V}{t}} = {{- {g_{leak}\left( {V - E_{rest}} \right)}} + {I.}}} & \; \end{matrix}$ where Cm is the membrane capacitance, I is the sum of external and synaptic currents, gleak conductance of the leak channels, and Erest is the reversal potential for that particular class of synapse.
 4. The neural model of claim 1 wherein the populations of input neurons are connected with equal probability and equal conductance to all of the populations of output neurons.
 5. The neural model of claim 1 wherein the populations of input neurons are connected randomly to the populations of output neurons.
 6. The neural model of claim 1 wherein the neural model is implemented with a memristor based neuromorphic processor.
 7. A neural model for reinforcement-learning and for action-selection comprising: a plurality of channels; a population of input neurons in each of the channels; a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels; a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of; and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of; wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced; and wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
 8. The neural model of claim 7 wherein: each population of output neurons in each of the channels are coupled to each population of input neurons in each of the channels by a synapse having spike-timing dependent plasticity; each channel of reward neurons is coupled to output neurons by a synapse having spike-timing dependent plasticity; the input to each population of inhibition neurons from a population of output neurons in a same channel that the population of inhibition neurons is part of is by a synapse having spike-timing dependent plasticity; and the output from each population of inhibition neurons in a channel is coupled to output neurons in every other channel except the channel of which the inhibition neurons are part of by a synapse having spike-timing dependent plasticity; wherein the spike-timing dependent plasticity of each synapse behaves according to g _(eff) →g _(eff) +g _(effmax) F(Δt) where Δ t = t_(pre) − t_(post) ${F\left( {\Delta \; t} \right)} = \left\{ {{\begin{matrix} {A_{+}^{(\frac{\Delta \; t}{\tau_{+}})}} \\ {A_{-}^{(\frac{\Delta \; t}{\tau_{-}})}} \end{matrix}{if}\mspace{14mu} \left( {g_{eff} < 0} \right)\mspace{14mu} {then}\mspace{14mu} g_{eff}}->{{0{if}\mspace{14mu} \left( {g > g_{effmax}} \right)\mspace{14mu} {then}{\mspace{11mu} \;}g_{eff}}->{g_{effmax}.}}} \right.$
 9. The neural model of claim 7 wherein each population of input neurons, each population of output neurons, each population of reward neurons, and each population of inhibition neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to $\begin{matrix} {{C_{m}\frac{V}{t}} = {{- {g_{leak}\left( {V - E_{rest}} \right)}} + {I.}}} & \; \end{matrix}$ where Cm is the membrane capacitance, I is the sum of external and synaptic currents, gleak conductance of the leak channels, and Erest is the reversal potential for that particular class of synapse.
 10. The neural model of claim 7 wherein the populations of input neurons are connected with equal probability and equal conductance to all of the populations of output neurons.
 11. The neural model of claim 7 wherein the populations of input neurons are connected randomly to the populations of output neurons.
 12. The neural model of claim 7 wherein as a response increases from output neurons of a channel of which a population of inhibition neurons is part of, the inhibition neurons inhibit the responses from populations of output neurons in every other channel.
 13. The neural model of claim 7 wherein the neural model is implemented with a memristor based neuromorphic processor.
 14. A basal ganglia neural network model comprising: a plurality of channels; a population of cortex neurons in each of the channels; a population of striatum neurons in each of the channels, each population of striatum neurons in each of the channels coupled to each population of cortex neurons in each of the channels; a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to striatum neurons in a channel that the reward neuron is part of; and a population of Substantia Nigra pars reticulata (SNr) neurons in each of the channels, wherein each population of SNr neurons is coupled only to a population of striatum neurons in a channel that the SNr neurons are part of; wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of striatum neurons are rewarded and have their responses reinforced; wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of striatum neurons are punished and have their responses attenuated; and wherein each population of SNr neurons is tonically active and is suppressed by inhibitory afferents of striatum neurons in a channel that the SNr neurons are part of.
 15. The basal ganglia neural network model of claim 14 wherein: each population of cortex neurons in each of the channels are coupled to each population of striatum neurons in each of the channels by a synapse having spike-timing dependent plasticity; each population of striatum neurons in a channel are coupled to striatum neurons in every other channel by a synapse having spike-timing dependent plasticity; each channel of reward neurons is coupled to a population of striatum neurons in a same channel by a synapse having spike-timing dependent plasticity; each population of SNr neurons is coupled to a population of striatum neurons in a same channel that the population of SNr neurons is part of by a synapse having spike-timing dependent plasticity; and wherein the spike-timing dependent plasticity of each synapse behaves according to g _(eff) →g _(eff) +g _(effmax) F(Δt) where Δ t = t_(pre) − t_(post) ${F\left( {\Delta \; t} \right)} = \left\{ {{\begin{matrix} {A_{+}^{(\frac{\Delta \; t}{\tau_{+}})}} \\ {A_{-}^{(\frac{\Delta \; t}{\tau_{-}})}} \end{matrix}{if}\mspace{14mu} \left( {g_{eff} < 0} \right)\mspace{14mu} {then}\mspace{14mu} g_{eff}}->{{0{if}\mspace{14mu} \left( {g > g_{effmax}} \right)\mspace{14mu} {then}{\mspace{11mu} \;}g_{eff}}->{g_{effmax}.}}} \right.$
 16. The basal ganglia neural network model of claim 14 wherein each population of cortex neurons, each population of striatum neurons, each population of reward neurons, and each population of SNr neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to $\begin{matrix} {{C_{m}\frac{V}{t}} = {{- {g_{leak}\left( {V - E_{rest}} \right)}} + {I.}}} & \; \end{matrix}$ where Cm is the membrane capacitance, I is the sum of external and synaptic currents, gleak conductance of the leak channels, and Erest is the reversal potential for that particular class of synapse.
 17. The basal ganglia neural network model of claim 14 wherein the populations of cortex neurons are connected with equal probability and equal conductance to all of the populations of striatum neurons.
 18. The basal ganglia neural network model of claim 14 wherein the populations of cortex neurons are connected randomly to the populations of striatum neurons.
 19. The basal ganglia neural network model of claim 14 wherein a Poisson random excitation is injected into the populations of SNr neurons.
 20. The basal ganglia neural network model of claim 14 wherein uniform random noise is injected into the populations of SNr neurons.
 21. The basal ganglia neural network model of claim 14 wherein the basal ganglia neural network model is implemented with a memristor based neuromorphic processor. 